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Abstract 

A discussion of the LMS adaptive filter relating to its convergence characteristics 
and the problems associated with disparate eigenvalues is presented. This is used to 
introduce the concept of proportional convergence. A novel approach is used to ana- 
lyze the convergence characteristics of block frequency- domain adaptive filters. This 
leads to a development showing how the frequency- domain FIR adaptive filter is easily 
modified to provide proportional convergence. These ideas are extended to a block 
frequency- domain HR adaptive filter and the idea of proportional convergence is ap- 
plied. Experimental results illustrating proportional convergence in both FIR and HR 
frequency- domain block adaptive filters is presented. 

1 The LMS Adaptive Filter 

We first present Widrow’s LMS adaptive filter, analyzing it’s mean convergence fol- 
lowing his approach in [5]. The equations describing an FIR filter in both scalar and 
vector notation are 

N - 1 

Vj = £ = wTx o = X f W 

»=0 

with Xj = [as,-, Xj-. u a : i _ 2 , . . . , *,-(^- 1 )] and W T = [u> 0 , u>i, . . . , wjv-i]- 

The goal of an adaptive filter is to automatically adjust the weights W so that the 
difference between the output {y n } and some desired signal {d n } is a minimum in a 
mean square sense. That is, with ej — dj — yj = dj - W T Xj = dj - XjW 

E (( ej ) 2 ) = E (dj) - 2 E (djXf) W + W T E (XjXf) W 

where E () represents the expectation operator. With P zc * = E { djXj ) (the cross- 
correlation between {d n }) and {z„} and R xx = E^XjX^j (the autocorrelation of 
{ sc n } ) , the mean square error £ can be written 

i = E (dj) - 2 P? d W + W t R xx W (1) 

Since the MSE (mean square error) is a hyperquadratic function of the weights, in 
essence a bowl shaped surface with a single global minimum, the minimum can be 
sought using simple gradient search techniques. That is 

W j+1 = W j -f i V j 


3.4.2 


where ft is a scalar constant that controls the rate of adaptation, and Vj is the gradient 
of the error surface with respect to the weights evaluated at time j. 

dElef) 

Vj = = ~ 2P * d + 2R ** W ( 2 ) 

Setting gradient to 0 yields the optimum choice of W for mi ni m um mse 

W* = R~ x lP xd (3) 

Since R xx is positive definite, it is non-singular and it’s inverse exists. This equation 
is the Weiner-Hopf equation written in matrix form, thus W* represents the optimum 
Weiner filter. 

Obviously, if R xx and P xd are known exactly (3) gives the optimum weights and 
requires the inversion of the autocorrelation matrix, but this need only be done once. 
In one approach to adaptive filtering, a large number of data samples are processed to 
obtain an accurate estimate of R xx and P xd , then R xx is found and W* is computed and 
the data are filtered using W*. This approach requires a large amount of data storage, 
imposes a significant processing delay and works well only for stationary signals. The 
LMS approach uses the gradient method presented above but to minimize the storage 
requirement, the gradient is estimated on the basis of no more data than are present 
in the filter itself. That is, at iteration j, we estimate R xx ta XjXj and Pxd ~ djXj. 
With this, the estimated gradient at iteration j becomes Vj = —2 ejXj and the weight 
update becomes 

W j+1 = Wj + 2 m Xi (4) 

Admittedly, these estimates of the gradient are noisy, but if /i is chosen to be small, 
the error in each estimate will contribute little to the final solution. 

To analyze the convergence characteristics of the LMS adaptive filter, consider an 
ensemble of adaptive processes that all begin with the same initial weight vector Wo- 
Also, the inputs to each adaptive filter are drawn from the same statistical populations. 
If we take the expected value over the ensemble of the weight update (4), we have 

E (W j+1 ) = E ( Wj ) + 2ft ( P xd - R XX E (Wj)) (5) 

provided that Xj and Wj for each adaptive process are uncorrelated. We first note, 
that in the mean, the gradient estimate equals the true gradient, so, it the adaptive 
filter converges, the weight vector converges to the Weiner solution. We can simplify 
(5) if we translate the weight vector coordinates so that the optimum weight vector in 
the new coordinate system is 0. That is, let the new weight coordinates be represented 
by the vector V where Vj = Wj - W*. Letting Wj be Vj + W* and recognizing that 
P xd = R XX W*, the above expression becomes 

E(V j+1 ) ={I-2fiR xx )E{Vj) (6) 

= (I-2fxR xx y +1 V 0 

This is a geometric equation whose convergence depends on R xx and fi. If we now 
rotate the weight space so that the axes fall along the principle axes of R xx , the above 
equation will be transformed into a set of uncoupled scalar equations. Since R xx 
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is symmetric, it can be orthogonally diagonalized as in A = Q~ 1 R X xQ where A is a 
diagonal matrix composed of the eigenvalues of R xx , that is A = diag[Xo , Aj, . . . , A/v-i] 
and Q is the orthonormal modal matrix of R xx , that is the columns of Q are the 
normalized eigenvectors of R, each of which is distinct and orthogonal to the other 
eigenvectors. Thus QQ T = I and Q _1 = Q T . Now, substituting R xx = QAQ T into 
(6) and premultiplying both sides of the equation by Q T gives 

E(vj) = (I-2pAyVl (7) 

where V' = Q T V is a rotation of the translated weight coordinates into the principle 
axes of R xx . Now, because A is diagonal, the above equation decouples into a set of 
scalar equations 

v Pj = U ~ 2/xA P ) J ' v' po 

with v' p representing the pth element of V'. We will refer to {u p } as modes of the 
adaptive filter. It is evident that for E (vj^ and hence E (Wj) to converge, we must 
have 

|1 - 2pAp| < 1 Vp 

which will be satisfied if 0 < p < 1/A max since 0 < A p Vp because R xx is positive 
definite. It is important to note that this only guarantees convergence in the mean 
over an ensemble of adaptive processes. Note here that choosing p = 1/2A P will give 
the fastest convergence for mode p. In fact it should converge in the mean in one step, 
but unless A p = \ max , other modes will not converge, p must be typically be chosen 
< l/2A mox because the gradient estimate is itself noisy. Often, the eigenvalues are 
not directly available. Since R xx is positive definite, A max < (X^ilo 1 and 

since tvR xx is just the average power in {«„}, p is conveniently chosen as l/trA xx .) 

It should be clear that A max limits rate at which the filter can converge. The mode 
associated with A max converges the fastest and that the mode associated with A m ;„ 
will be the last to converge. That is, the overall convergence of the LMS algorithm 
is controlled by the spread in the eigenvalues of R xx . This aspect has attracted a lot 
of attention among researchers attempting to speed up LMS convergence. If all the 
eigenvalues were equal, the LMS algorithm could be made to converge at the fastest 
possible rate, taking into account that the overall rate must still be relatively slow to 
compensate for the fact that we have only estimated the gradient. 

Another related problem we encountered [13] has to do with the non-proportional 
convergence of the modes of the adaptive filter, particularly in non-stationary envi- 
ronments. The most highly correlated modes of R xx will have the largest eigenvalues 
and these modes will be resolved first. Since the eigenvalues of R xx are related to the 
power spectral density of X, this essentially means that the spectral components of 
X that have the most power will be resolved first. If the problem is non-stationary, 
the lower power components may never be resolved. This effectively alters the power 
spectral density of the output process in a way that may be undesirable. For example, - 
consider processing speech for noise cancellation using the line enhancer configuration 
of the adaptive filter. The spectral components of speech having the highest power are 
those in the low frequencies, so these will be resolved first. The high frequencies may 
never be resolved. In this case the non-proportional convergence has effectively added 
a low-pass filter. This is very undesirable as a large percentage of the intelligibility of 
speech is carried in the higher frequencies. 
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One way to achieve both goals of faster and more proportional convergence is to 
normalize (7) by replacing y with //A -1 . This amounts to replacing a scalar-/* by a 
vector or a matrix y. If we do this, and follow our development back through the 
rotation and translation steps, we arrive at the update equation 

W j+1 = Wj + 2 yR-lejXj (8) 

A number of algorithms have been developed using this approach which is essentially 
Newton’s method for the minimization of a quadratic surface. Most of these are 
referred to as “self-orthogonalizing” adaptive filters and they attempt to estimate R~£ 
at each iteration. [26,6] [5,20] This results in a much larger computational load than 
might be desired. Not only is there the additional load of estimating R ~* , but also an 
extra matrix multiply is required between it and the gradient estimate. 

Another way to do the same thing, and the approach we pursue in this paper, is 
to transform the input data into the rotated space, perform the update and output 
computations in the rotated space and then inverse transform the output data. To 
see how this might work, let us premultiply both sides of (8) with Q T . With R ~ £ = 
QA~*Q T , this becomes 

Wj +1 = W] + 2yA~\X' } (9) 

where Wj = Q T Wj and Xj = Q T Xj represent vectors in the rotated space. Because A 
is diagonal, we shall often refer to yA* 1 as a vector-y. Note that each of the modes of 
the adaptive process decouple and we end up with N one-weight adaptive filters with 
the adaptive gain equal to y/\ p for the pth mode. This can also be seen in the fact 
that A = E (xjXj r ), so A p is just the power in the pth component of Xj averaged 
over the iterations j. This approach is very attractive in that it requires no more 
computation than the usual LMS time-domain approach, yet it promises improved 
convergence rates as well as more proportional convergence. The only obstacle to be 
overcome is the problem of transforming X into the rotated space. We could attempt to 
estimate Q, but this is essentially the same as the Newton’s methods discussed above. 
What is needed is an orthogonal transform that decouples the spectral components 
of X. Several candidates have been studied, the most prominent among them being 
the Discrete Fourier Transform (DFT). Efforts to improve convergence in this way 
developed synergisticly with efforts to reduce the number of computations required in 
the adaptive filter by using the Fast Fourier Transform (FFT) to implement high speed 
convolution and correlation. 

Yet another recent approach [30] uses a DCT to estimate {A p }, orders the set by 
magnitude and then assigns to y a sequence of values related to the reciprocals of the 
ordered set {A p }. Effectively, once the mode associated with A max is converged, y can 
be increased to speed the convergence of the mode associated with the next largest 
eigenvalue until it too is converged and so on. 

2 The Block Adaptive Filter 

A block adaptive filter using FFTs to perform fast convolution and correlation was 
proposed nearly simultaneously in 1980 by Clark et. ad. [9,14,18] and Ferrara [10]. 
Clark also presented the effect of block processing on the convergence and misadjust- 
ment of the adaptive filter as opposed to point by point processing. Waltzman and 
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Schwartz [1,3] had earlier applied the use of the FFT to the automatic channel equal- 
izer. Not only did they show that the filter could be run with fewer computations, 
but by adapting the weights in the frequency domain they were able to obtain tighter 
bounds on A max and A m ; n and hence a more accurate setting for p. The following 
discussion is based largely on the approach used by Clark et. al. 

We note that the convolution yj = W T Xj = XjW can be written in matrix form 


as 


Vj 

Vj+i 
Vj+ 2 
yj + 3 

. 


Xf 1 



Xj+1 

Xf+2 


Wq 

Wi 

Xj+3 


• 

• 


w N-l 

- 



( 10 ) 


or as 

Yi = XiW 


( 11 ) 


where Yj is an L element vector [yiL,VlL+u • • • , y(i+i)L-i] T and Xi is an L by N matrix 
whose rows are the transposes of the vectors Xj = [xj, »j-i , . . ., for j = 

( IL , IL + 1, ..., (/ + 1)X — 1). We also similarly define desired and error vectors Di and 
Ei so that Et = Di - Yi = Di — XiW. Thus, instead of computing the error for every 
input point, we only do so once every L points. For this approach, we are interested 
in minimizing the block mean square error £b- 


Zs = E(jEfE^ 


Under the assumption that the inputs are stationary, it is easy to show that the block 
mean square error will be the same as the mean square error in the unblocked filter. 
Further, the Weiner optimum weights will be the same in both cases. 

We also only update the weights once per block. Following the same approach 
as for the unblocked case, we approximate the the gradient of the £b using only the 
information available at that iteration. That is £b ~ 1 / LEfEi. (We note that this 
is L times the information that was available to the unblocked filter so we suspect 
that this gradient estimate is not as noisy as the estimates generated by an unblocked 
filter.) This leads to a block weight update equation 

Wj+i = W, + ^ XjE x (12) 


where ps is the block adaptive gain constant. 

By a similar analysis as applied to the unblocked filter, it can be shown that conver- 
gence is guaranteed in the mean if 0 < ps < 1/A mox which is the same condition as for 
the unblocked case. However, we must set pb = Lp for the blocked and unblocked fil- 
ters to converge at the same rate. Depending on L and how much smaller than l/2A max 
p is chosen to compensate for the larger gradient estimate noise in the unblocked filter, 
this may or may not be possible. So the blocked filter may be constrained to converge 
more slowly. This is particularly true in the case of highly disparate eigenvalues. We 
need to remember here that this discussion applies to a scalar pb and that when we 
eventually introduce a vector -p, the convergence of the blocked algorithm can be made 
faster than that of the unblocked filter. There is, however, another factor that may 
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limit how large L can be. In applications where {a: n } is slowly non-stationary, L must 
be small enough so that over several successive Z-point blocks (the number depending 
on the convergence rate) {z n } is approximately stationary. Otherwise, the filter would, 
at best, not properly track the non-stationarity or, at worst, become unstable. 

Let us next consider how to apply the FFT to reduce the number of computations 
required in the block adaptive filter. 

The AT-point DFT of a sequence x(n) is computed 

X{k) = £ i(n)W»* 

n= 0 

where Wn = e~^ and IF# for n = 0, 1, 2, . . . , IV - 1 are the N roots of unity. 
Similarly, the inverse transform is given 

*< B )=j7 

k=0 

These transformations can be written in matrix form 

* = VNTX and X = ~T* t X 

y/. N 

where X — [*(0), A'(l), . . . , X{N- l)] r , X = [z(0), z(l), . . . , x(N - 1)] T , T is a matrix 
whose ( k,n)th element is W^/s/N, and * represents the complex conjugate and T 
represents the transpose operation. We note first of all that T is symmetric and that 
T* — so that T is a unitary transform. Also, it can be shown [25,2,21] that if C 
is a circulant matrix, T will diagonalize it. Further, if the diagonalization is expressed 

A c = TCT* (13) 


then the eigenvalues of C and the elements of Ac will be given by the DFT of the first 
column of C . 

With this background, let us now illustrate how the DFT can be used to implement 
the convolution in (11). First we rewrite (11) as follows 


(14) 


• 


• 

’ Wi ' 

Y x 


x x • 

0 


where the -’s represent arbitrary elements which do not affect the equation, 
simply, 


More 


If Xf can be made circulant, then the DFT can be used to diagonalize it. This can be 
done by defining Xf to be an M x M circulant matrix whose first column is the M- 
point vector [®ii,_(iv-i) ? • • • i xil-i, xil, x lL+\i • • •> Each successive column 

is formed by circularly down shifting the previous column by one sample. The M point 
augmented weight vector W a formed by padding the jV-point W with N - M zeros. 
The result will be another augmented vector Yf whose last L points is the vector Fj. 
With I — we can write 


Y, a = 


y/M 


T*TXfT'sfMTW a 
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which, with the circulancy of Xf, can be written 

Y, a = DFT - 1 {DFTdzn-w-v , . . . , x lL , . . . , * (/+1 ) l-i]) ® DFT{W a )} (15) 

where ® is a point-by-point multiply of two vectors. Consider the following example 
with l = 0. 


yi-N 


*1 -N 

XL- 1 

* • • xl-n+1 

XL-N 

• • • x 3 - N 

Xi-N 


w 0 

h-N 


X 2 -N 

Xl-N 

• • • Xl-N+2 

&L—N+1 

• ■ • ®4-JV 

X3-N 


W-L 

yo 


xo 

X-l 

X\ -N 

XL- 1 

x 2 

Xl 


Wn-1 

2/1 



Xo 

X 2 -N 

Xl-N 

x 3 

x 2 


0 

VL-2 


XL- 2 

XL- 3 

••• Xl-N-1 

XL-N- 2 

’ • • Xl-N 

XL- 1 


0 

VL- 1 


XL- 1 

XL- 2 

Xl-N 

XL-N-1 

X 2 -n 

*l-iV _ 


0 


(16) 


From the above, we can see that the last L — M — ( N — 1) points y 0 to yL-i represent 
points from the linear convolution of x(n ) with W. The previous N — 1 points are 
incorrect and are discarded. The next segment of z(n) to be processed will overlap 
the previous by N - 1 points, ie., . . . , *r_i, il, aii+i, • • •> * 2 i-i] This will 

produce another valid L points which are abutted with the previous set. This is the 
standard “overlap and save” approach to discrete convolution using the DFT. 

Waltzman [1,3] and, later, Ferrara [10], observed that the update (12) involved a 
cross-correlation between X; and Ei and that this also could be sped up by applying 
FFTs. To see how this is done we first observe that by taking the complex conjugate 
transpose of (13), we have 

A* c = TC t T* 

Now, as before, we augment the vectors in (12). We use the same circulant matrix Xf 
and augmented vector W a . This requires the use of an augmented vector Ef, whose 
first N — 1 points are 0 and whose last L points form E[. That is 


' w l+1 ' 


’ Wi ' 

-f 2 — 

'• XT' 

0 

• 


• 

L 

• 

Ei 


(17) 


or, simply, 

Wf +1 = Wf + 2^~-Xf T Ef 

Li 

This is fortunate because it gives us a consistent set of augmented vectors. Wf and Xf 
are the same for both the update and convolution equations. Also, Ef — Df — Yf, if 
we define an appropriate augmented vector for Di. Next, with the fact that F*F = I, 
the augmented update equation becomes 

Wf +1 = Wf + 2~ -y=F* FXf T F* 

L/ \JM 


VMFEf 


Since XXf T T* is given by the complex conjugate of the DFT of the first column of 
X“, the update equation ultimately becomes 

Wf + 1 = Wf + 2^-DFT- 1 { (DFTdx ^.,), . . . , x lL , . . . , as (f+x)1 ;_i]))* (18) 

®DFT(Ef)} 
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Figure 1: The Block Frequency Domain LMS Algorithm 

Remembering that we must set the last M — N points of to zero before per- 

forming the next convolution, (15) and (18) describe the block frequency domain LMS 
algorithm. This process is summarized in Figure 1. Since the FFT is a linear opera- 
tor, there is considerable flexibility in the structure of the algorithm. If we move the 
window to a position just before the adaptive gain multiply, we get a structure that 
reduces the number of multiplies and adds in the weight update calculation. Another 
possible rearrangement is shown in Figure 2. Here, the adaptive gain takes place in the 
frequency domain and opens up the possibility for selective gains for each frequency 
bin. In this structure, the FFT after the window could be moved above the weight 
recursion to reduce the number of additions. 

Choosing the FFT length to be twice the number of weights (M = 21V) allows the 
filter to generate L = N + 1 valid output points at each iteration. Taking advantage of 
symmetries, we can argue [29] that a 21V radix 2 point FFT requires 21Vlog 2 (lV) — 41V 
real multiplies and 31Vlog 2 (lV) + 2 N - 12 real additions. Working from Figure 1, we 
see that there are 5 FFTs. Using the symmetry in the FFT of a real sequence, each ® 
operation requires IV — 1 complex multiplies and 2 real multiplies. The operations 
each require N real additions. The adaptive gain requires N real multiplies. Overall, 
to generate N + 1 output points requires 101Vlog 2 (lV) — 111V — 4 real multiplies and 
15IVlog 2 (IV)+16IV— 64 real additions. To produce N +1 points from an IV-weight time- 
domain adaptive filter requires 21V 2 + 3 N + 1 real multiplies and 2 N 2 real additions. 
Of course, the time domain filter performs a weight update for each output point 
produced and, although we could alter it to do an update only once every N + 1 points 
as the block filter does, would not result in any computational savings. The ratio of 
complexity of the frequency-domain block adaptive filter to the time-domain adaptive 
filter for several values of N is given in the following table. 

We need to remember that while we have achieved some computational savings, 
that is not our only objective. Next we consider the issue of proportional conver- 
gence. Much of what follows is motivated by Picchi and Prati’s presentation of a 
self-orthogonalizing adaptive equalizer in [20], however our approach is novel and re- 
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Table 1: Ratio of Frequency Domain to Time Domain Calculations 




3.4.10 


quires fewer approximations. 

In order to formalize the sectioning and zero-padding operations we define two 
projection operators II/y and Tl. Tx is constructed from an M x M identity matrix 


whose first N — 1 diagonal elements have been set to 0. Thus Tx = 


0 

I 


where I 


L J 

is Lx L. 11* is similarly constructed but in this case the first N diagonal elements are 
left at 1 with the remaining L — 1 elements set to 0. When Tx premultiplies a vector, 
it projects its first N — 1 points to 0. When premultiplying a matrix, it sets the first 
N — 1 rows of the matrix to 0. Postmultiplying a matrix by Tx is equivalent to setting 
the first N — 1 columns to zero. IIjv has similar properties but applies to the last L — 1 
points of a vector or rows or columns of a matrix. With this (17) can be restated 


W? +1 = w t + 2^-n N X? T Et (19) 

which formalizes the fact that the weight vector must be updated in such a way so 
that last M — N elements of W] 1 are always 0. 

Now, to consider the convergence of the augmented blocked filter, we take the 
expected value of (19) over an ensemble of adaptive processes just as we did in for the 
unblocked filter. This gives 


E (W? +1 ) = E (Wi> + 2^n n e (x? t e?) 

With Ef = Df — T l X?W? and with the assumption of independence between X“ and 
Wi the above expands to 


E (W? +1 ) = E {Wi) + 2^11* [E (X? T D?) - E (X? T T L X?) E (W t )) 
Observing that 


E (x, oT Z>?) = E 
and that 
E 


• X T 

• xl 


0 

Di 


M 


Pxd 

1 /LE(x?Di) 


= LPZd 


• Xf 

’ 0 0 ' 

' 

\_ r tf*x 1 iLElxfxy 

• x c r 

0 I 

Xi x c 

/ [l ILE(x^Xi) 1 /LE{X?X C ) 


or, in augmented matrix notation, 

E (X? T T L X?) = LR° XX 


where we have chosen X c to represent a part of the circulant matrix X*. We can now 
write 

E (W? +1 ) = E ( Wt) - 2fi B U N {R% X E {Wt) - P? d ) (20) 

This is just the blocked and augmented analog of (5) and the projection operator and 
the quantity in parentheses is the augmented form of the gradient (2). To find the 
optimum weight vector, we set the gradient to 0, but we note that this imposes no 
constraint on the augmented components of the gradient since II jy already forces them 
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to zero. All that is required is P x d = R XX W*. That is, the projection operator II// 
guarantees that W* a is W* (the Weiner optimum weight vector) padded with L - 1 
zeros, which is precisely what we desire. If II// were not present, as is done in Mansour 
and Gray’s unconstrained filter [15,16], setting the gradient to zero would also require 
that 

E (x?Xi) W* = E (x?Di) (21) 

which is only true if Di = XiW* + $ where $ represents samples of a white gaussian 
noise sequence. This will be true if {zj} and {dj} are related by a simple FIR transfer 
function whose order G is less than or equal to N . Provided this is the case, we can 
delete ITjv which also allows us to drop two FFTs. The only drawback is that while 
the last L — 1 points of W a start out as 0, gradient noise will allow small non-zero 
quantities to enter in which will lead to a slightly higher mean square error (MSE). 
When M > G > N , N-G of the weights will tend toward non-zero values and the filter 
will be able to achieve a lower mean-square error than a constrained N weight filter. 
The lower MSE comes at the expense of introducing circular correlation products; 
however, if N - G is small with respect to L this may be tolerable. The amount of 
error introduced has yet to be studied extensively. The same is true when the transfer 
function between {d n } and {«„} is IIR where the unconstrained filter will attempt 
to use all M weights. The unconstrained filter also fails when it is configured as a 
line enhancer where xj = z~ s dj. Because the filter is unconstrained it can find the 
non-causal solution for which the mean square error is zero - unfortunately, this is a 
useless solution. 

While we could continue to pursue the mean convergence of augmented blocked 
algorithm, it is exactly the same as that of the blocked algorithm. What is important 
to note here is that, although the above expectations have explicitly determined the 
augmented portions of P xd and R% x , these portions contribute nothing to E 
as a result of the projection II// and the zero padding of Wf (also due to II//). This 
will later allow us to augment R xx in such a way that R xx is easily invertible. 

As we saw in Section 1, we can both speed up and orthogonalize the convergence 
of the LMS adaptive filter by replacing /z with pR xx - Writing (8) using augmented 
vectors consistent with those we have already defined gives 

W?+ 1 = W? + 2 f ~r-K N (R-^) a Il N X? T E? 

where ( R xx ) a is an M X M augmented matrix whose upper left corner is composed of 
the N X N matrix R xx . The remaining elements of the matrix are arbitrary. Pre-and 
post-multiplying (R xx ) a by 11// guarantees that they do not enter into the computation. 
Now, if we again take expectations over an ensemble, we will have 

e (w? +1 ) = e <wi> - 2m 3 n N (R-Z) a n N (r° xx e (w,) - pu) (22) 


But, 

n N (RZZ) a n N R a xx = n N 

= n N (R° xx )- l R a xx 


and, while we have II// P xd = II//IZ“ x W* a , since the augmented portion of P xd is not 
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constrained, we can let P* d = R% x W* a . With this we have 

n N (R^rn N PZ d =n N (R~zrn N R% x w* a 

= n N w* a 

= U.ff(R xx )~ 1 Rl x W* a 
= n N (R° xx )-'P2 d 

Thus, in the mean , IIjv(.R~ x ) a IItf can be replaced by IIjv(.R“ x ) -1 and the update 
equation becomes 

W? +1 = W? + 2^n N {R a xx )-'XfE? (23) 

Now, we pre-multiply with \[MP and insert = I at appropriate points. Thus, 

W/* +1 = W/* + m. N T*‘lVy-T{R a xx )- x T'3 : XfT*e? 

1 j 

where W and £ represent DFTs of W a and E a respectively. Next, if R xx can be made 
circulant, T will diagonalize it. That is A“ = !FR* X T* and A a_1 = F(R xx )~ 1 J r *. 
With this the update equation becomes 

W“ +1 = W,° + J r n N F'2^-A a - 1 ?X? T F t £t l (24) 

This results in the structure shown in Figure 3. This filter [28] is exactly the same as 
the fast implementation of the block LMS adaptive filter shown in Figure 2, except fi 



Figure 3: The Block Frequency Domain LMS Algorithm with a Vector fi 

has been replaced by ^(A“) _1 . Since this simply represents a point-by-point multiply 
of two vectors, we see that this algorithm requires no more multiplies nor adds to 
implement than does Figure 2 1 . The only additional computational load arises in 
estimating (A°) -1 . 


1 We need to note that Figure 2 requires N more real multiplies than did Figure 1. 
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As we indicated above, we must augment R xx in such a way as to make R% x 
circulant. If we let r n = r_ n = E (xkXk+n) represent samples of the autocorrelation of 
x(n), then 



ro 

*•1 

r 2 



fi 


r i 

• • • rjv -2 

Rxx = 

r 2 


ro 

••• rN-3 


. ^AT-l 

XN-2 

?N-3 

• • • ro 


Next, following the approach used by Picchi and Prati in [20], we define R xx as a 
circulant matrix whose first column is given as 

[r 0 , ri, r 2 , . . . , r/v-x, rjv_ 1, r/v- 2> ■ • • » 7 ’2> »*i] 

and whose successive columns are given by circularly down-shifting each preceding 
column by one sample. It is easy to see that the resulting 2 N — 1 X 2 N — 1 matrix 
contains the N xN matrix R xx in its upper left corner. Defining R xx this way imposes 
restrictions on L, namely we require L = JV . L can be chosen greater than N by 
defining the first column of R xx and inserting L — N zeros as follows: 

[j'o, *"ii • j TN— i» 0 , . . . , 0 , ry_i, r^_2, • • • , r 2 , **i] 

However, the only way L can be smaller than N is if r n = 0 V n > G, for some G < N. 
In this case the first column is given as 

[ r o, **1, r 2 , . . . , r< 3 - 1, 0 , . . . , 0 , t*g-x, r G _ 2 , • • • , r 2 , r x ] 

In the above there are G - N zeros, so the overall length is M = N G — 1, and thus 
L = G. 

Provided we can satisfy these restrictions on N and L we can easily diagonalize 
R xx using the DFT. That is, 


A? = 



(25) 


where A“ are just the diagonal elements of A°. From this expression we see that X? axe 
just samples of the power spectral density of {xy}, P IX (w) convolved with the Fourier 
transform of a rectangular window of length M. Thus, the problem of estimating A° 
is simply one of spectrum estimation. 

A well known method for spectrum estimation is Bartlett’s procedure [23]. Using 
this approach gives 


A° = diag 


1 

K + 1 


K 

£*/ a ®*r 


1=0 


(26) 


where A) a is the DFT of the first column of Xf. That is, at block K, A? is just the 
average over K + 1 blocks of the power in the *th bin of the DFT of the reference input. 
E is a sample of P xx (u;) convolved with the Fourier transform of a triangular 
window of length M evaluated at u> = ^p-. Although this is not exactly equivalent to 
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(25), it provides a simple estimator. The estimator given in (26) was first proposed by 
Ogue et. al. in [19] and later by Picchi and Prati in [20] for use in an adaptive channel 
equalizer. 

For a general adaptive filter, especially in the case of slowly non- stationary inputs, 
an exponential average might be more appropriate. That is, 

a? + 1 = rb Af + r+^3 dia9 [xr ® xr] (27) 

where (3 is chosen to reflect the rate of non-stationarity. We need to be careful that 
any of the Af do not become too small or they will cause the digital filter to overflow 
when inverted. One way to prevent this is to constrain them to be larger than some 
lower limit. This is equivalent to adding a noise floor to x before the FFT is taken 
leading to the A estimate but it doesn’t require as much computation. 

In summary, we have found a way to use the FFT to easily estimate the eigenvalues 
of R xx and used this to orthogonalize the adaptive process. If we had taken a purely 
heuristic approach, viewing the frequency-domain filter as a set of M independent 
one-weight adaptive filters operating on the bins of M- point DFTs of x(n) and d(n) 
we would have arrived at the same estimates as (26) and (27). That is, the adaptive 
gain of each filter is determined only by the power in the associated bin. Of course, 
adjacent bins of the DFT are not necessarily independent of each other. Hodgkiss 
and Nolte [4] argued that the covariance between two Fourier series coefficients of a 
process is approximately zero if the power spectral density of the process is relatively 
smooth. This argument can be extended to the variance between DFT coefficients, 
since these are just aliased Fourier series coefficients [24]. In this context, adjacent 
DFT coefficients are approximately independent of each other if the power spectral 
density changes slowly over the bandwidth of the DFT bins which will be the case if 
the process contains no isolated periodic components and we choose M large enough. 

These arguments correspond to arguments made above. If N, and hence M, is 
greater than G, that is, if r n = OVn > G , then the rectangular window disappears and 
the Af in (25) are exactly samples of P xx { u>). Further, the difference between using a 
rectangular window and a triangular one diminishes as M increases and the estimate 
in (26) approaches that in (25). 

We also note at this point that the filter in Figure 3 degenerates to Narayan and 
Peterson’s Transform Domain filter (TDF) [11,17] if the overlap is set so that at each 
iteration only one output point is produced. That is L = 1 and M = IV. In this case, 
the inverse DFT used to generate Yf only produces one usable point. Since this one 
point is the first point in the vector, its computation only involves the average of the 
points in 3^, by definition of the DFT. Although the TDF requires more computa- 
tion than the LMS adaptive filter, it has the advantage of proportional convergence. 
Narayan [17] also proposed other transforms including the DCT to be used in place of 
the DFT. 

Another frequency- domain adaptive filter is the Dentino filter [8]. The Dentino filter 
simply implements (8) using the DFT to orthogonalize the input data. It ignores the 
fact that the DFT actually implements a circular convolution, so none of the vectors are 
augmented and there are no projection operations. Though the filter output suffers 
from circular convolution products, occasionally the meaningful information can be 
extracted from the converged weight vector. Maiwald et. al. [7] observed that for the 
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case where the input is truly cyclic with period equal to the length of the filter, as 
in an adaptive equalizer operating on a repeated training pulse, the frequency-domain 
weights of an M - point Dentino filter converge to the DFT of the optimal M-point 
transversal filter. After the weights have converged, they may be used to process data 
using standard overlap and-save-techniques. This works well in an equalizer where no 
output is generated during the training sequence. Another example of the Dentino 
filter used simply for the information in the weights is given by Reed et. al. [22], who 
use the filter to perform bearing estimation. 


3 Proportional Convergence 

In this section we present the results of two experiments that illustrate the proportional 
convergence of the general frequency-domain adaptive filter. In the first experiment we 
present a signal consisting of two sinusoids to to a general 32-weight general frequency 
domain adaptive filter configured as an adaptive line enhancer. In this configuration, 
Xj = z~ s dj. This configuration is commonly used to improve signed to noise ratios 
in noisy “single microphone” data. The filter will enhance input signals that have a 
strong autocorrelation, while rejecting those whose autocorrelation is narrower than 
S. The first sinusoid has a frequency 0.25 times the sample frequency and the second 
has a frequency of 0.3125 times the sample frequency. This puts the first sinusoid in 
the 17th bin of a 64 point FFT, and the second in the 21st bin. The first sinusoid 
has an amplitude of 2.5 and the second 1.25 which gives a two to one relationship in 
amplitude. We also set the first 128 samples of the signal to zero. Thus we will have 
some zero output points before the sinusoids reach the filter. Finally, we analyze the 
output data by looking at the average magnitude in the two processed sinusoids as a 
function of output points. This is done for two cases. The first is the scalar-// case, 
which is implemented with a vector-// containing all ones. The second uses a vector-// 
tailored to the power in the two sinusoids. It has the 17th and 21st points (as well as 
the symmetric points) in the vector-// set to 0.000156 and 0.000626 respectively which 
is based on the inverse of the average power in these bins. All the other points in 
the vector -// are set to 0.002. The power in these bins is actually very close to zero, 
which would ordinarily result in a much larger // at these points, but this is undesirable 
since it may cause the filter to overflow. Instead we simply limit the minimum bin 
power. The result of these two experiments are shown in Figure 4. The two thin 
lines represent the scalar-// case, while the two thicker ones represent the output of 
the vector-// case. The // was chosen for the vector-// case so that the higher-power 
sinusoid was resolved at the same rate as in the scalar-// case. We can easily see that 
in the scalar-// case the lower-powered frequency was not resolved at the same rate as 
the higher-powered one. However, the vector-// allows the filter to follow a much more 
proportional convergence. 

For the second demonstration we use a frequency domain filter configured as a 
canceler. We apply speech spectrum noise to the reference (xj) input and the same 
noise filtered through a 32 point FIR filter applied to the desired ( dj ) input. This 
is a standard system identification problem. To minimize the mean square error, the 
adaptive filter weights will converge to the same values as the system filter. Using 
speech spectrum noise will give a disparate set of eigenvalues in R xx . In this case we 
analyze both the average spectral energy in the error output as well as the mean square 
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Figure 4: Output of a Two Sinusoid Enhancer 


error. As before, we do this for the case of a scalar-/!, as well as a vector-/! tailored to the 
spectral energy in the reference input. This latter was chosen by inverting the average 
power in the bins of a 64-point FFT over several overlapping 32 point sections of the 
reference input. Analyzing the mean squared error in this case of a scalar-/! gives the 
result shown in Figure 5. This is probably more accurately described as a ‘smoothed’ 



Figure 5: Mean Squared Error for a Scalar /! 


square error curve since it is generated by squaring the error and then smoothing it 
by averaging each point with a few neighboring points. By plotting this against a 
logarithmic scale, we see at least two different rate of convergence. This illustrates 
the fact that the filter resolves the weights associated with the larger eigenvalues of 
R xx first. Having only a scalar-/! limits the convergence rates of the weights associated 
with the smaller eigenvalues. 

Figure 6 shows a series of 4096 point spectral averages of the error output of the 
filter. Each average is based on 5 4096 point FFTs. The thin line represents the 
spectral energy in the reference input and each successively thicker line represents 
averages starting at 12800, 44800, and 172800 points into the output file respectively. 
We should point out that the mean square error plot stops about where the third 
trace begins. If we were to continue the mean square error curve we should be able to 
identify several more progressively decreasing rates of convergence. 
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Frequency 


Figure 6: Scalar p Processing 

By comparison, the mean square error and spectral average plots for a vector - p are 
given in Figures 7 and 8 respectively. In Figure 8 the averages are taken beginning 



Figure 7: Mean Squared Error for a Vector p 

at 12800, 38400, and 128000 points into the output file respectively. We see from both 
these figures that the vector -p gives us a much more proportional convergence. 

Both these demonstrations dealt with stationary problems, and used pre- determined 
vector-/xs. For non-stationary problems the vector-^ could be estimated using an ex- 
ponential average as in (27). Again, as we pointed out in the first demonstration, 
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occasionally we have to limit the largest values in the vector-/*, because even though 
the average power in those bins will be very small, a very large corresponding vector- /* 
element may cause the filter to overflow. 

We also need to guard against choosing the block length too large since, as ex- 
plained in Section 2, this may slow the overall convergence of the filter. Also, the 
filter will be less able to track any non-stationarities in the input. On the other hand, 
choosing longer FFTs allows us to better estimate A° and provide more proportional 
convergence. One way to balance these two requirements is to increase the number 
of points overlapped in each section. This decreases the number of points output at 
each iteration and increases the computational overhead but provides the advantage 
of more proportional convergence. 


4 A Frequency-Domain HR Adaptive Filter 


Next, we consider extending the frequency- domain techniques developed for the transver- 
sal adaptive filter to the recursive (or IIR) filter. This will include both the application 
of the DFT to perform fast block IIR convolution and the use of a vector-/* to make 
the adaptive process converge proportionally. First we present the LMS IIR adaptive 
filter as it was first developed in 1975 by White [31]. We will not concern ourselves 
here with questions of stability beyond those addressed by White. 

In the recursive filter, the output is a weighted sum not only of the current and 
past inputs but also of past outputs. 

N a ~ 1 Ni-1 

Vj = — biVj-i - 1 ( 28 ) 

a— 0 t=0 


The z-transform of the transfer function of the IIR filter is simply 


Y{z) 

X(z) 


JV.-l 

»=0 

N h -1 

1 + te -1 ' 1 

i= 0 


(29) 


As with the transversal LMS adaptive filter, we form an error sequence {e^} using 
the difference between the output {j/j} and a desired signal {dj}. Then we use the 
expected value of the square of the error as a performance criteria upon which to base 
decisions about how to adjust the forward and backward weights, a,- and b{ respectively. 
Unfortunately, the reclusive nature of the equations make analyzing E (^j the way 
we did with the FIR filter nearly impossible. Instead we simply proceed with the 
implementation. As before, we approximate E (e^ with e?, then we find the gradient 
of the squared error with respect to the weights and use this to adjust the weights. 
First, we observe V(ej) = 2 eyVej and 

T 

dyj dx/j dyj dyj dyj 

ddQ 5 da ,\ ' ’ da,N a -i * dbo ' ’ 56;v t -i 


Vej = V(dj - yj) = - 
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Now we need to find and jjji . But 


dyj d 


(Na-l 


N b -1 


= xT. 52 “kZj-k - 52 b kVj-k - 1 


dai da-i 


(30) 


= x 3-i ~ Y bk 

k = 0 


fe =0 k =0 

N b -i a 

dyj-k - 1 


dai 


and similarly, 


dyj u dyj-k - 1 

9j- = -W-i - £ 6 * 


fc =0 


06< 


(31) 


Thus the gradient estimate may itself be computed recursively. With suitable adaptive 
gains f. if and jib, the weight update equations are written 


dyj 

°*i+i = a b + 2ti f e jQ^ 

dv ' 

b 'j+ 1 = + 2 ^ be ^~dbi 


(32) 

(33) 


We observe that we can reduce the number of computations required to implement 
a recursive filter simply by performing FFT implemented block convolution on the 
forward part of the filter and then computing the feedback contribution a sample at 
a time for each sample in the block. However, our object here is to find a way to 
bring to bear the power of the FFT in order to reduce the number of computations in 
the feedback portion of the filter as well as the feedforward. Several ways to perform 
recursive convolution in a block fashion have been developed [36,38,39,40,41]. We will 
use an approach first proposed by Voelcker and Hartquist [37], but we will follow a 
simpler development and make an additional observation. If we consider (28) it first 
appears that there is no obvious way to perform block recursive convolution when the 
block length L is greater them 1, since the feedback portion of the filter will require 
inputs before they have been computed. This is true unless the backward weights 
b 0 , . . ., bL-i (with L < Nb) are identically zero. It would seem that this requirement 
would limit the types of problems to which such a filter structure could be applied. 
However, we will show that it is possible to add extra poles to the transfer function 
in such a way as to set bo through to zero. The effect of the extra poles can be 
compensated by adding equivalent zeros to the transfer function. To formalize these 
ideas, let us adopt a slightly different notation. First, we rewrite the filter equation as 


N a - 1 N b -1 

Vj — 52 a i X J-i ~ 52 

i = 0 »=0 


(34) 


where we now represent the first non-zero backward weight as bo- Then, the transfer 
function becomes 

N a ~ 1 


Y(z) 

X(z) 


52 ’ 

t=0 

N b - 1 

i + z~ L Y b i z ~ { 


i= 0 


(35) 
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To implement the feedback portion of this filter using block methods, the largest block 
that can be computed at a time will be L points long. Suppose we want to replace a 
recursive filter with N p forward weights (poiPi> • • • >PN r -i) and N q backward weights 
(g 0 , <h, . . . , gjv,-i) with a block filter that has a feedback block length L. Both systems 
will have to be equivalent, that is 


S Pi 2 * 


S °‘ z ’ 


1 + 2 1 ^2 q,z 1 l + z L ^2 b<z 1 

i = 0 t=0 


The order of the blocked denominator is Nb + L - 1, while that of the unblocked 
denominator is N q , so we will have to add N e poles and zeros to the unblocked system 
where 

N t = N b + L - N q - 1 (37) 

That is, we need to find (eo, ei, . . . , and (bo, 6j, . . . , 1 ) such that 



N b - 1 

= i + z~ L bi z ~ i 

i=0 


(38) 


Let us rewrite this equation as 

(1 + z~ l Q(z))E(z) = 1 + z~ l B(z) (39) 

where Q(z) = eSo" 1 ®*"*. B(z) = T&t'biZ-* and E(z) = 1 + z" 1 EST 1 ®*"*)- 
Then we write 

E(z) = 1 + z~ l B(z) - z~ i Q(z)E(z) (40) 

If we recognize that (40) describes E(z ) as the output sequence of an all-pole recursive 
filter with input sequence 1 + z~ L B(z), we can see that the first L points of the sequence 
(1, eo, . . . , eN e -i) correspond to the first L points of the impulse response of an all-pole 
filter with transfer function H(z ) = l/(l-\-z~ l Q(z)), which is just the feedback portion 
of the filter we are trying to model. Since Q(z) is general, we see that N e + 1 > L. 
To keep E(z) finite, we must choose B(z) to clear the filter. Conceptually we can do 
this by setting bj to the negative of the output of the filter e£+i +j for 0<j<N q -l. 
This will introduce N q zeros into the filter after which, provided there is no further 
input (ie: bj = 0 Vj > N q — 1), the filter produces no further output. This would lead 
to Nb = N q and N e = L - 1. We could let N e > L - 1 by not chosing the B(z ) to 
immediately clear the filter. This gives us some flexibility in choosing the initial terms 
of B(z). 

We can also make these same observations by expanding the left side of (38) and 
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equate the coefficients of the two polynomials. In matrix notation this is 


1 

0 

0 

0 

9o 

1 

0 

... Q 

9i 

9o 

1 

0 

i 

.. ^ .. 

9AT.-2 

QN.- 3 

1 

QN t - 1 

QN„- 2 

9N q - 3 

••• 9N q -N t - 2 

0 

QN q - 1 

2 

••• QN q -N'-l 

0 

0 

0 

QN q - 1 . 


1 

0 

0 


0 

bo 

h 


( 41 ) 


b N b - 1 J 


We note that the matrix is N q + N e + 1 by N e + 1 and that the rightmost vector has 
Nb + L elements which equals N q + N e + 1 using (37). Next, we observe that only the 
first N e + 1 rows of the equation are needed to solve for (eo, ei, . . . , eiv,-i)* Also, since 
the determinant of the submatrix formed from the first N e + 1 rows of the matrix is 
1, a unique solution exists. If L > N e + 1 however, rows N e + 2 through row L of the 
matrix equation will not, in general, satisfy the equality. By using (37), this constraint 
can be written as N g > Nb, that is, the equation is guaranteed to have a solution if 
N b > N q . If Nb = N q , (b 0 , bi,..., &iv fc -i) are explicitly determined, but in the case that 
Nb > N q , (bo, . . . , &jv fc -iv 9 -i) can be chosen arbitrarily 2 . Also notice from (41) that L 
can be chosen larger than N q . 

Thus we can determine the ej either by using matrix methods on the first N e + 1 
rows of (41) or, as we observed above, the first N e points of the impulse response of a 
recursive system whose feedforward weights are given by the first N e + 1 elements of 
the solution vector and whose feedback weights are given by the first N e + 1 elements 
of the first column of the matrix excluding the leading l. 3 The undetermined b terms 
are then found by performing the matrix operations indicated in the last N q rows of 
( 4 !). 

Once the terms (e 0 , ei, . . . , e^v.-i) have been determined, the block forward weights 
(do, dj, . . . , div 0 -i) can be found by adding an identical set of zeros to the system. That 


is. 


N a -i 


»=0 


< 

t=0 



(42) 


Hereafter, we will refer to 1 + z -1 Yli^o 1 as the auxiliary equation and it’s roots as 
auxiliaries. From this we see that the order of the feedforward portion of the block filter 
will have order N a — 1 where N a = N p +N e . With (37) we have N a = N p +Nb—N q +L—l. 
Although we can choose L to be smaller than Nb we will achieve greatest efficiency 
when L ~ Nb • In such a case, N a ~ 2 Nb + N p — N q — 1 and, with the constraint 
Nb > N q , we require N a «> Nb + N q — 1. In most cases of interest, therefore, N a will 
be larger than Nb- In the case that N p « N q and, with the fact that implementing the 
convolution with FFTs requires that N a and Nb must be powers of 2, then most likely 


2 Other considerations may determine how we select values for these terms. 

3 We note that these two approaches are mathematically equivalent. 
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we will need N a = 2 Nb- Often, it is undesirable to implement a filter that requires 
two different length FFTs. One approach is to use the larger length FFT for both 
convolutions but to use twice the amount of overlap in the feedback convolution as in 
the feedforward. A more efficient approach is based on the observation that a 2N-point 
convolution can be performed using two iV-point convolutions. That is, 

N a -l Na/2-1 N a - 1 

2 Ij ~ 53 a i X j-' = 53 a i X j~i "b 53 a i x j—i 

i = 0 i=0 t'=JVa/2 

Notice that each convolution involves half of the weights and the input to the second 
convolution is just the input to the first convolution delayed by N a j 2. If N a /2 is chosen 
to be equivalent to the block length L, then the input to the second convolution is the 
same as the input to the first except that it is delayed by one block. Thus, one FFT 
can be used for the input and, excluding the initial FFTs required for the weights, 
only three FFTs per block are required to implement a fixed weight recursive filter. 
The choice of L — N a /2 forces us to use 50% overlap but, by saving an extra FFT, it 
is likely the most efficient approach. We also have assumed in the preceding argument 
that we want to use the same block length for both of the forward and the backward 
block convolutions. Although it is possible to have a different block length for each 
convolution, the use of two different block lengths increases the storage requirements 
of the filter as well as its complexity. 

Presuming that we have chosen a suitable block length L which is larger than or 
equal to the larger of the required number of backward weights and half the required 
number of augmented forward weights, that is L > Nf, and 2 L > N a , then we pad N a 
to 2 L and Nb to L by adding zeros to each. From these we form three X-point weight 
vectors 

Aai = [a 0 ,ai,...,a£_ 1 ] r , Abi = [aL,OL+i, • • • , a 2 L-i] r , and Bi — [6 0 »&i, • • • , &£-i] T 

Now defining the matrix X; as we did for (11) (but with N = L) and the matrix Yj 
the same way, we can write the block recursion as 

Yj = XiAai + Xi-iAbi — Yj_i2?j 

where Yi is the Zth L point output vector (defined as Yj in (11). Following the procedure 
developed in Section 2 we augment these vectors and matrices making Yj“ j and Xf 
circulant. This gives 



or, simply, 

Yj a = XfAa? + X^Abf - Yf^Bf 

Then, applying DFTs we have 

DFT([x^_^ l ,. . .,Z(/ +1 )£_i]) ® DFT(Aaf) 1 
Yj a = DFT~ l +DFT([x {l _ 2)L ,...,x lL _ 1 ))®DFT(Ab f) i (44) 

-DFT([y {l _ 2)Lt . . . , sm-r]) 0 DFT(B\ f) J 
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which results in the structure diagramed in Figure 9 

Next, we need a method to adaptively update the coefficients of the block filter. 
As we did for the block FIR filter, we first form an X-point error vector Ei from 
the difference of the /th output block Y\ and the /th set of L points from the desired 
input, Di. Then we form the block mean square error and estimate it using only the 
information available at the /th iteration, yielding 

t B = E(j-E?Ei} « jEfEt 

and the block gradient estimate becomes 

v, = jVEfEi = ^{yy l ) T E l 


(45) 
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where 


(vr,) r = 


(— gft— y 

_ \d a (t + 2)L-l J 

«) 

( V 

- V st (l+1)4-I / 


9yiL+ i 

da o 
dl/IL + l 
da\ 


§ULL §VtL±l 

Sa (l+2)L-1 fi “(t+3)£-l 

flyir, dyiL+i 

ooo dbfj 

dyir. dyiL+\ 

dbT eh 





Sy(i+i)L-i 

da a 

d y(m)L-i 


dyti+pL-i 
9“(f+ 2)L-l 
d yp+i)L-i 


dyq+DL - 1 
db\ 


8 V(l+l)^-> 

di>(|+i)£-i 


From (34) we can find 

dyj » dyj-k-L , dyj-k-L / 

“ d wr- yi -‘- L ~ (47) 

As in the scalar case, (47) can be solved recursively. By comparing these equations 
to (34) and (35), we note that the recursions can be computed using block methods 
directly as diagramed in Figure 10. Then we compute the gradient estimate and update 



t 


Figure 10: The Block Recursive Filter Weight Partials 


the weights 


.. 2 p f V> X dyi L+ k - k . , sr dyii+k 

°*l+i a H + r Q e lL+k "i| + , — + T 2^t Ah. e lL+k 

L—C\ 


L t'o db <‘ 
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Figure 11: The Block Recursive Filter Weight Update 

where y is the adaptive gain. Thus, the weight update can be performed as shown in 
Figure 11. 

At long last, we have a complete block HR filter implemented using the FFT to 
perform fast convolution. To summarize, we present the complete algorithm. To do 
so, we define the vectors: 


Xj = *J£+i> • • • , *(i+i)L-i] 

Df = [d iL , di L+ i, d( /+ i)£_i] 

Yf - [vil, vil+ i, • • • , y(i+i)i-i] 

Ef = [e iL , e lL + 1, . . . , e( J+1)£ _ 1 ] ( 49 ) 

Aa^ — K, Oj, o-L— l] 

Ab T = [ a L , a L + 1, . . . , 021,-1] 

B T = [&oi&i»--->&L-i] 


Also, we define T to be a projection operator that sets the first L points of a vector to 
zero. With the FFT and it’s inverse represented by T and J r_1 , we first compute the 
feedforward contribution and subtract from that the feedback contribution computed 
at the previous iteration to generate L points of output. 


X? = nxli\\xn T 

y{ a = X? ® Aa a + Xf_i ® Ab a - Vf_ x 

[o T ||if ] r = TT-'yi a 

Ei = Di- V 


Next we update the forward weights 


o a ai i = riGZ^UGZF 

Z^=G a a ,®B 

[O r li ZZ\* = TT-'Z° ai 

^Oi(*+l) = [ X IL - *>•••» ~ E ai 

Ox = Oi + ^L~EfG ai (i+ 1 ) 


) Vi ; 0 < i < 2L - 1 


(50) 


( 51 ) 
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where G ai l — at each iteration and Z and Z are intermediate results. We then 
update the backward weights 


^bi == && ® ® 

[o T !l^] i ’ = rr-'z b ° 

Gbi(l+1) = - Z bi 

bi = k+ 2 -pE?G biil+1) 


> Vi ; 0 < i < L - 1 


(52) 


where G bi i = |^ at each iteration. Finally, we transform the weights back into the . 
frequency domain and compute the feedback contribution for the next iteration: 


Aa = ^ r [ J 4a T ||0 T ] T 
Ab = T[Ab T \\0 T } T 

B = F[B T \\0 T ] T (53) 

y? = r\Yim T \ T 

v, = y,‘ ® b 


Equations (50), (51), (52) and (53) constitute a frequency domain HR adaptive filter. 

Now we turn to the question of efficiency. As with the FIR adaptive filter, we 
restrict our focus to filters whose inputs are real sequences. 

With N p forward and N q backward weights, a time-domain HR filter requires N p + 
N q reed multiplies and real additions including the error calculation. There are N p + N q 
recursions required to compute the weight updates and each requires N q + 1 multiplies 
and N q + 1 additions including the sum for the actual weight update. Apart from the 
recursions we also need two multiplies to form pjej and m,ej. Altogether, the time- 
domain IIR filter requires {N p + N q )(N q -f 2) + 2 real multiplies and (N p + N q )(N q + 2) 
read additions for each output point. If we use 2 L point FFTs in the frequency domain 
filter, the feedback filter actually implements L feedback weights and 2 L feedforward 
weights. An equivalent time-domain filter would use the same number of feedback 
weights. From (37) using N q = L gives N p = L + 1. We also need to consider that 
the frequency domain filter produces L output points at each iteration. To produce L 
output points the equivalent time-domain filter requires 2 L 3 -f 5L 2 + iL real multiplies 
and 2 L 3 + 5 L 2 + 2 L real additions. 

The filter section of the frequency-domain filter (excluding the weight FFTs) uses 
3 FFTs, 3 ® operations and one three input Yj operation. Using the same FFT 
complexity estimates as we used for the frequency- domain FIR filter, the IIR filter 
section uses 6Alog 2 (X) — 2 real multiplies and 9Xlog 2 (A) + 18 L - 42 real additions. 
There are 3 L recursions of the type in Figure 10. Each uses two FFTs, one ® operation 
and one two input Y operation. Here, the sum occurs in the time domain and only 
requires L real additions. So weight paxtials require 12X 2 log 2 (T) — 12X 2 — 6 L real 
multiplies and 18X 2 log 2 (£) + 21X 2 — 78 L real additions. There are also 3 L weight 
recursions of the type shown in Figure 11. The ® operations involve L point real 
vectors in addition to the multiply by p b or ///, which can be performed after the Y- 
Each recursion requires L + 1 multiplies and L additions for each of the 3 L weights. 
Adding in the cost of the three weight FFTs gives the cost of the weight updates as 
3X 2 + 6£log 2 (A) — 9 L real multiplies and 3X 2 + 9Xlog 2 (X) + 6L - 36 real additions. 
Thus, the overall cost of computing L output points using the block recursive LMS 
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adaptive filter is 12(L + L 2 ) log 2 (i) — 9 I? — 15 L - 2 real multiplies and 24L 2 -f 18(1- + 
L 2 )\og 2 (L) + 901- — 78 real additions. 

Table 2 gives the ratio of frequency- domain to time-domain operations for several 
choices of L. By comparison, the ratios are a little worse than but still of the same 


L 

Multiplies 

Additions 

8 

1.376 

4.460 

16 

1.102 

2.850 

32 

0.758 

1.731 

64 

0.480 

1.015 

128 

0.289 

0.581 

256 

0.169 

0.327 

512 

0.096 

0.181 

1024 

0.054 

0.099 


Table 2: Ratio of Frequency Domain to Time Domain Calculations for the HR Filters 

order as those in Table 1. Here, we see that we need a filter length > 64 before we 
realize a computational savings. W hi le this filter length may be reasonable for a FIR 
adaptive filter, one of the reasons for adopting an nR approach is the hope of shorter 
filters. However, the use of block methods may allow faster and more proportional 
convergence even in smaller filters, justifying the added computational expense. 

4.1 Proportional Convergence 

The forward and backward filters are not strictly independent from each other and we 
cannot find a simple non-recursive expression for the optimum weights as we did for 
the FIR adaptive filter. If we take a somewhat heuristic approach and treat the filters 
as independent from each other, we can apply some of the same ideas that lead to the 
use of a vector-//. Following the same development that lead to (24) 

Aaf+1 = M + ‘;f (Ff V^y,) 0 

Abf + 1 = .46? + JFn (E?V Ab Y,y (54) 

Bf + 1 = Bf + (F, T v B y,) 0 

where we have used V Aa Yi to be the first L columns of VF/ and similarly for V A b 
and Vjg. The notation indicates the vector gradient with respect to each of the 
components of W. n represents a projection operator that selects the first L points of a 
vector and the the final term in each of the above equations is augmented by padding 
each with L zeroes. Noting that V Aa = [G O0 (/ + i), Go^z+i), • . • , ( ?o (I ,_ I) (/+i)] r , \ 7 Ab = 

[<?o i (J+i)><J ! a (I , +1) (i+i),--->C : a (2t _ 1) (/+i)3 :r » and ^ = Gboji+I), (?{,,(/+!), :,G f i, (t _ 1) (/ +1 )] r , 
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we adjust the weight updates in (51) and (52) as follows 


hBi = ^ E ?G bi{l+1) 
hAai = -yj-Ef G ai (i+1) 
h-Abi = 2j L- E T G a (iJrL) (l+ 1 ) 


Vi ; 0 < i < L — 1 


Aaf +1 = -Aa“ + njr-iA*- 1 jr[tfJJO r ] r 
Abf +1 = Abf + nr-'ki'F[H T Ab \\o T ) T 
B? +1 = Bf + n^- 1 A2' 1 ^[^|| 0 r ] r 


(55) 


where Ha<i = [/Ua 0 ) • • • > and HAb and Hg are defined similarly. The 

A’s can be estimated as they were in either (26) or (27). An exponential average is 
preferred for A£ since it is difficult to predict beforehand the inputs to the backward 
filter. If we use exponential averages, then 




K +l 


r^ + !+**»[*•*'] 

rFS*i + 


(56) 


Since the input to the backward filter is less predictable, one would usually choose 
(3 b to be larger than (3 /, and thus provide for a shorter window average. Notice that 
despite the fact that we have broken the forward convolution into two parts, we still 
use only one A“. Even if we used separate estimators for both forward filters, they 
would be very similar since, except for a block delay, the inputs are the same to both 
forward filters. In Section 5 we present the results of a frequency domain HR filter 
using both forward and backward vector-//s. Using a vector-// for the forward filter 
provides dramatic improvement in the case where the input to the filter is characterized 
by a large eigenvalue spread. The improvement resulting from using a vector-// in the 
backward filter is more difficult to assess but it appears to be useful. 


4.2 Feasibility 

Although adding auxiliary poles and zeroes allowed us to do the block recursion, and 
although we showed that we could always find the auxiliary roots, we cannot guarantee 
that the auxiliary poles and zeros will always lie on top of each other since we separately 
adjust the forward and backward weights 4 . The plots in Section 5 indicate in fact that 
they do not begin to coincide until the filter is almost converged. If the auxiliary root 
happens to lie outside the unit circle and if the auxiliary pole follows a path (in time 
or space) different from the one the auxiliary zero takes when outside the unit circle, 
then the filter will become unstable. Another possibility is that the adaptive filter 
may prevent the pole from moving outside the unit circle if the error surface gradient 
estimate is steep enough in the direction associated with this motion. Though this 
avoids instability, it prevents the filter from converging to the optimal solution. We say 
a solution is feasible if the all the poles, including the auxiliaries lie inside the unit circle. 
Voelcker and Hartquist [37] showed that if L is chosen large enough, all the auxiliaries 
will fall inside the unit circle. We have noted above that choosing N b larger than 

4 Even in a fixed filter, roundoff errors will cause the auxiliary poles and zeros not to coincide. 
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N q allows (6o> • • • , bjf b -N,-i) to be chosen arbitrarily. We give an example to illustrate 
that this choice can also affect whether or not a solution is feasible. Suppose we choose 
Nb = N q = L = 3, in particular a system with poles at (-.7,0) and (-.5, ±.5). The 
denominator of the system function is in this case 1 + 1.7z -1 + 1.2z -2 + .35z -3 . Setting 
up (41) for this case 



and solving for eo and e\ gives that the auxiliary equation is 1 — 1.7z -1 + 1.69z -2 which 
has roots outside the unit circle at (0.85, ±.983616). If we set up the same problem, 
but with Nb — 4 we have 

1 0 0 [ e 0 [ -1.7 

1.7 10 e x = -1.2 (58) 

1.2 1.7 1 J [ e 2 [b 0 - .35 

where we are free to choose bo. We notice that the choice of bo will only affect e 2 . 
e 0 and e\ will always be —1.7 and 1.69 respectively. Choosing e 2 = —.75 gives the 
auxiliary equation 1 — 1.7z _1 + 1.69z~ 2 — .75z -3 which has roots just inside the unit 
circle at (0.464943, ±0.870465) and (0.770114,0). To get this value for e 2 requires that 
we set bo = .433. 

Choosing Nb still larger the N q gives more degrees of freedom (and also introduces 
more auxiliaries). Whether this can be done in such a way as to guarantee that all 
solutions for problems of a particular order will be feasible, or whether an adaptive 
filter left to choose these “free” feedback weights will always choose a feasible solution 
if one is available, are questions that require further research. 

5 Frequency- Domain HR Experiments 

Here we present the results of an experiment to demonstrate both the frequency- domain 
HR filter itself, and the application of proportional convergence to the filter. 

In this set of experiments, we color a white noise source using a first order But- 
terworth filter designed with a cutoff of 0.1/ 4amp / e and implemented with an impulse 
invariant transform. This colored noise is then applied to a time-domain HR filter. The 
output of this filter is applied to the desired input of a frequency-domain block IIR 
adaptive filter implemented with 8 forward and 4 backward weights using 16 point 
FFTs. The colored noise is also applied to the reference input of the frequency- 
domain filter making this a system identification problem and the time-domain HR 
filter the system model. A random number generator forms the white noise source. 
The coloring filter has one forward and two backward weights which are [0.253195] and 
[—1.158046,0.411241], respectively. The model filter has four forward and three back- 
ward weights which are [1, —1, 1,-1] and [—.5,0.25] respectively. In each experiment 
we have used p = .002 for both the forward and backward sections of the filter, and 
at each time we process 2000 eight point blocks, saving the weights every 40th block. 

The filter will attempt to converge to the blocked version of the model weights. 
Here we have N p = 4, N q = 3, and Nb = L = 4. By (37), we have N e = 4, so there will 
be 4 auxiliaries. Also, Nb~N q = 1, so we Eire free to choose one of the auxiliaries. One 
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possible set is [(0.271844,0), (-.385923, ±.557571), (0,0)]. The poles should converge 
to [(.5, ±.5), (-.5, 0)], while the zeros should converge to [(0, ±1), (1,0)]. 

There are three separate experiments, each using a different set of values for (3f and 
/3ft, which are used to estimate the forward and backward vector-//s as. given in (56). 
If the A a s are initialized to the identity matrix, setting /3 = 0 is equivalent to using a 
constant scalar-//. In Case A, both /3/ and /3(, are set to 0. In Case B, we set /3 / to 
0.03 which gives an exponential average of about 17 blocks. Finally in Case C, we set 
both (3j and to 0.03. Plots of the pole-zero tracks and final pole-zero positions after 
2000 blocks for each case are shown in the following figures. We also plot the forward 
and backward weights after each 40 output blocks. 

We see that Case C, with both a forward and backward vector-//, converges in 2000 
blocks. Case B, with only a forward vector-//, also has the poles and zeroes converged, 
but the auxiliaries are still moving. Case A has one zero converged and some other 
poles and zeroes near convergence. The difference in the forward weight plots between 
Cases A and B illustrates how the vector-// improves the convergence, making it more 
proportional. Comparing the backward weight plots for Cases B and C shows that 
a backward vector-// may have caused some improvement, but it is difficult to judge. 
Still, it appears that the backward vector-// improved the overall rate of convergence. 
This is an area where more experimentation needs to be done. As we indicated in 
Section 4, a larger might be expected to perform better. 


6 Summary 

We have presented a general frequency-domain FIR adaptive filter that not only re- 
quires fewer computations than the time-domain LMS adaptive filter, but also pro- 
vides faster and more proportional convergence which preserves the signal’s spectral 
structure. We have also shown that the frequency-domain adaptive filters previously 
proposed by Dentino [8], Clark [9], Ferrara [10], Narayan and Peterson [11,17] and 
Mansour and Gray [15,16] are special cases of this general frequency-domain adap- 
tive filter. We have extended these ideas to the recursive LMS adaptive filter and 
shown that it is possible to reduce the number of computations from order L 3 to order 
L 2 \og 2 {L). At the same time, we have presented a way to use the FFT to perform 
fast HR convolution through the addition of auxiliary poles and zeros. We have shown 
that the convergence properties of the recursive LMS adaptive filter can be improved 
by using an appropriately chosen vector-//. It is possible to show [29] that we can 
use these fast convolution techniques to reduce the number of computations in the 
CHARF [33], SHARF [34], and Feintuch’s recursive LMS filter [32]. Both the general 
frequency- domain adaptive filter and the frequency-domain LMS recursive filter were 
implemented and tested. The results presented in this paper show the advantage of 
using a vector-//. 
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